首页> 外文OA文献 >phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning
【2h】

phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning

机译:phi-LsTm:基于短语的图像字幕分层LsTm模型

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

A picture is worth a thousand words. Not until recently, however, we noticedsome success stories in understanding of visual scenes: a model that is able todetect/name objects, describe their attributes, and recognize theirrelationships/interactions. In this paper, we propose a phrase-basedhierarchical Long Short-Term Memory (phi-LSTM) model to generate imagedescription. The proposed model encodes sentence as a sequence of combinationof phrases and words, instead of a sequence of words alone as in thoseconventional solutions. The two levels of this model are dedicated to i) learnto generate image relevant noun phrases, and ii) produce appropriate imagedescription from the phrases and other words in the corpus. Adopting aconvolutional neural network to learn image features and the LSTM to learn theword sequence in a sentence, the proposed model has shown better or competitiveresults in comparison to the state-of-the-art models on Flickr8k and Flickr30kdatasets.
机译:一张图片胜过千言万语。但是直到最近,我们才注意到在理解视觉场景方面的一些成功案例:一种能够检测/命名对象,描述其属性并识别其关系/交互作用的模型。在本文中,我们提出了一种基于短语的分层长短期记忆(phi-LSTM)模型来生成图像描述。所提出的模型将句子编码为短语和单词的组合序列,而不是像那些常规解决方案中那样将单词序列单独编码。该模型的两个级别专用于:i)学习生成与图像相关的名词短语,以及ii)从语料库中的短语和其他单词生成适当的图像描述。与基于Flickr8k和Flickr30k数据集的最新模型相比,所提出的模型采用卷积神经网络来学习图像特征,并使用LSTM来学习句子中的单词序列。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号